Second-order Temporal Pooling for Action Recognition

نویسندگان

Anoop Cherian

Stephen Gould

چکیده

Most successful deep learning models for action recognition generate predictions for short video clips, which are later aggregated into a longer time-frame action descriptor by computing a statistic over these predictions. Zeroth (max) or first order (average) statistic are commonly used. In this paper, we explore the benefits of using second-order statistics. Specifically, we propose a novel end-to-end learnable action pooling scheme temporal correlation pooling that generates an action descriptor for a video sequence by capturing the similarities between the temporal evolution of per-frame CNN features across the video. Such a descriptor, while being computationally cheap, also naturally encodes the co-activations of multiple CNN features, thereby providing a richer characterization of actions than their firstorder counterparts. We also propose higher-order extensions of this scheme by computing correlations after embedding the CNN features in a reproducing kernel Hilbert space. We provide experiments on four standard and fine-grained action recognition datasets. Our results clearly demonstrate the advantages of higher-order pooling schemes, achieving state-of-the-art performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Order-aware Convolutional Pooling for Video Based Action Recognition

Most video based action recognition approaches create the video-level representation by temporally pooling the features extracted at each frame. The pooling methods that they adopt, however, usually completely or partially neglect the dynamic information contained in the temporal domain, which may undermine the discriminative power of the resulting video representation since the video sequence ...

متن کامل

Non-Linear Temporal Subspace Representations for Activity Recognition

Representations that can compactly and effectively capture the temporal evolution of semantic content are important to computer vision and machine learning algorithms that operate on multi-variate time-series data. We investigate such representations motivated by the task of human action recognition. Here each data instance is encoded by a multivariate feature (such as via a deep CNN) where act...

متن کامل

Pooling the Convolutional Layers in Deep ConvNets for Action Recognition

Deep ConvNets have shown its good performance in image classification tasks. However it still remains as a problem in deep video representation for action recognition. The problem comes from two aspects: on one hand, current video ConvNets are relatively shallow compared with image ConvNets, which limits its capability of capturing the complex video action information; on the other hand, tempor...

متن کامل

Sequence Summarization Using Order-constrained Kernelized Feature Subspaces

Representations that can compactly and effectively capture temporal evolution of semantic content are important to machine learning algorithms that operate on multi-variate time-series data. We investigate such representations motivated by the task of human action recognition. Here each data instance is encoded by a multivariate feature (such as via a deep CNN) where action dynamics are charact...

متن کامل

Face Identification with Second-Order Pooling

Automatic face recognition has received significant performance improvement by developing specialised facial image representations. On the other hand, generic object recognition has rarely been applied to the face recognition. Spatial pyramid pooling of features encoded by an over-complete dictionary has been the key component of many state-of-the-art image classification systems. Inspired by i...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1704.06925 شماره

صفحات -

تاریخ انتشار 2017

Second-order Temporal Pooling for Action Recognition

نویسندگان

چکیده

منابع مشابه

Order-aware Convolutional Pooling for Video Based Action Recognition

Non-Linear Temporal Subspace Representations for Activity Recognition

Pooling the Convolutional Layers in Deep ConvNets for Action Recognition

Sequence Summarization Using Order-constrained Kernelized Feature Subspaces

Face Identification with Second-Order Pooling

عنوان ژورنال:

اشتراک گذاری